Book Review: Cross-Language Information Retrieval by Jian-Yun Nie
نویسنده
چکیده
Cross-Language Information Retrieval is a compact book introducing a branch of information retrieval that has gained considerable research interest since the dawn of the WorldWideWeb in the mid 1990s. Information retrieval is generally concerned with the problem of finding documents within a large collection that are relevant to a given input query. Whereas the original formulation of IR assumes that queries and documents are written in the same language, cross-language IR (CLIR) presumes instead that they are written in two different languages. If the collection contains documents in more languages, then we refer to multi-lingual IR (MLIR), which is typically solved with multiple instances of CLIR. Recently, other variations on the theme have been proposed that address non-textual documents, such as image, music, and speech retrieval. An interesting application of CLIR is the retrieval of images that are provided with textual descriptions in any language. Computational linguistics could be interested in CLIR for several reasons. CLIR is mainly about the optimal integration ofmachine translation (MT) and IR, and it presents peculiar and difficult translation issues when short queries are involved, which is the most common case. For such problems, interesting approaches have been developed and refined over time, which mainly build on top of core statistical MT techniques (e.g., word alignment models, translation models) and various lexical resources (e.g., WordNet, dictionaries). In recent years, several books on IR have been published (e.g., Grossman and Frieder 2004; Manning, Raghavan, and Schütze 2008; Büttcher, Clarke, and Cormack 2010), which devoted at most a section or chapter to CLIR. As specific books on CLIR have been limited so far to edited collections of scientific papers (Grefenstette 1998), it was definitely time for the first monograph on the topic. Jian-Yun Nie’s volume is structured as five chapters, which are organized as follows:
منابع مشابه
Toward Cross-Language and Cross-Media Image Retrieval
This report describes the approach used in our participation of ImageCLEF. Our focus is on image retrieval using text, i.e. Cross-Media IR. To do this, we first determine the strong relationships between keywords and types of visual features. Then the subset of images retrieved by text retrieval is used as examples to match other images according to the most important types of features of the q...
متن کاملCLIR using a Probabilistic Translation Model based on Web Documents
In this report, we describe the approach we used in TREC-8 Cross-Language IR (CLIR) track. The approach is based on probabilistic translation models estimated from two parallel training corpora: one established manually, and the other built automatically with the documents mined from the Web. We describe the principle of model building, the mining of parallel texts, as well as some preliminary ...
متن کاملWord Pairs in Language Modeling for Information Retrieval
Previous language modeling approaches to information retrieval have focused primarily on single terms. The use of bigram models has been studied, but the restriction on word order and adjacency may not be justified for information retrieval. We propose a new language modeling approach to information retrieval that incorporates lexical affinities, or pairs of words that occur near each other, wi...
متن کاملCross-Language Information Retrieval
Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is...
متن کاملCross Language Information Retrieval : a Research
Cross-Language Information Retrieval (CLIR) has been a research sub-field for more than a decade now. The field has sparked three major evaluation efforts: the TREC Cross Language Track which currently focuses on the Arabic language, the Cross-Language Evaluation Forum (CLEF) – a spinoff from TREC covering many European languages, and the NTCIR Asian Language Evaluation (covering Chinese, Japan...
متن کامل